perm filename RESULT[AM,DBL]1 blob
sn#372415 filedate 1978-08-08 generic text, type T, neo UTF8
.NSECP(Results)
.R1PAGE: PAGE;
.if false then start;
This chapter opens by summarizing what AM "did". Section 1 gives a
fairly high-level description of the major paths which were explored,
the concepts discovered along the way, the relationships which were
noticed, and occasionally the ones which "should" have been but but weren't.
.ONCE TURN ON "{⎇"
The next section
({SECNUM⎇.2)
continues this exposition by presenting the results
of experiments which were done with (and ⊗4on⊗*) AM.
.ONCE TURN ON "{⎇"
Chapter {[2] EVALU⎇ will draw upon these results -- and others given
in the appendices -- to form conclusions about AM. Several meta-level
questions will be tackled there (e.g., "What are AM's limitations?").
.end;
.RESULT: SECNUM ;
.SSEC(What AM Did)
.QQ
Now we have seen that mathematical work is not simply mechanical, that it
could not be done by a machine, however perfect. It is not merely a question
of applying rules, of making the most combinations possible according to certain
fixed laws. The combinations so obtained would be exceedingly numerous, useless,
and cumbersome. The true work of the inventor consists in choosing among these combinations
so as to eliminate the useless ones or rather to avoid the trouble of making them,
and the rules which must guide this choice are extremely fine and delicate. It is
almost impossible to state them precisely; they are felt rather than formulated.
Under these conditions, how imagine a sieve capable of applying them mechanically?
-- Poincare'
.ESS
AM is both a mathematician of sorts, and a big computer program.
.BEGIN TURN ON "{⎇"
By granting AM more anthropomorphic qualities than it deserves, we
can describe its progress through elementary mathematics. It
rediscovered many well-known concepts, a couple interesting but
not-generally-known ones, and several concepts which were hitherto
unknown and should have stayed that way. Section
{[2]OVERV⎇.{[2]AMAMSSEC⎇, on page {[2]AMAM⎇, recaps what AM did, much
as a historian might critically evaluate Euler's work.
.if false then start;
A more
detailed prose description of everything AM did is found in Appendix
{[2]TRACES⎇.{[2]PROS⎇, beginning on page {[3]PROSP⎇.
.end;
Instead of repeating any of this descriptive prose here, Section
{SECNUM⎇.{SSECNUM⎇.1
will provide a
very brief listing of what AM did in a single good run, task by task.
.if false then start;
A much more detailed version of this same list is found in Appendix
{[2]TRACES⎇.{[2]TRATS⎇, beginning on page {[3]TRAT⎇. The task numbers
there correspond to the numbering below$$ They do ⊗4not⊗* precisely
match the task numbers accompanying the example given in Chapter
{[2]EXAM1⎇. $.
.end;
These task-by-task listings are not
complete listings of every task
AM ever attempted in any of its many runs,
but rather a trace of a single, better-than-average run of
the program.$$ In fact,
it is perhaps the best overall run. It occurred in two stages (due to
space problems; unimportant). In this particular run, AM
misses the few "very best" discoveries it ever made,
since the runs they occurred in went in somewhat different directions. It
also omits some of the more boring tasks: see, e.g., the description
of task number {[3]OMITT⎇. $ The reader may wish to consult the brief
alphabetized glossary of concept names in the previous chapter (page {[3]BRIEFC⎇).
Following this linear trace of AM's behavior is a more appropriate representation
of what it did: namely, a two-dimensional graph of that
same behavior as seen in "concept-space".
This forms Section
{SECNUM⎇.{SSECNUM⎇.2, and is found on page {[3]PBG⎇.
By under-estimating AM's sophistication, one can demand answers to
the typical questions to ask about a computer program: how big is it,
how much cpu time does it use, what language it's coded in, etc. These
are
found in Section
{SECNUM⎇.{SSECNUM⎇.3.
.END
. SSSEC(Linear Task-by-task Summary of a Good Run)
.SHORTASKP: PAGE;
.SHORTASK: SSECNUM;
.BN; INDENT 3,7,0; ONCE PREFACE 2;
λλ Fill in examples of Compose. Failed, but suggested next task:
λλ Fill in examples of Set-union. Also failed, but suggested:
λλ Fill in examples of Sets. Many found (e.g., by instantiating
Set.Defn) and then more derived from those examples (e.g., by running
Union.Alg).
.SETTASK: BNN;
λλ Fill in specializations of Sets (because it was very easy to find
examples of Sets). Creation of new concepts. One, INT-Sets, is
related to "Singletons". Another, "BI-Sets", is all nests of braces
(no atomic elements).
λλ Fill in examples of INT-Sets. This indirectly led to a rise in the
worth of Equal.
λλ Check all examples of INT-Sets. All were confirmed.
AM defines the set of Nonempty INT-Sets; this is renamed "Singletons" by the user.
λλ Check all examples of Sets. To check a couple conjectures, AM
will soon look for Bags and Osets.
λλ Fill in examples of Bags.
λλ Fill in specializations of Bags. Created INT-Bags (contain just one kind of
element), and BI-Bags (nests of parentheses).
λλ Fill in examples of Osets.
λλ Check examples of Osets.
λλ Fill in examples of Lists.
λλ Check examples of Lists.
λλ Fill in examples of All-but-first.
λλ Fill in examples of All-but-last.
λλ Fill in specializations of All-but-last. Failed.
λλ Fill in examples of List-union.
λλ Fill in examples of Proj1.
λλ Check examples of All-but-first.
λλ Check examples of All-but-last.
λλ Fill in examples of Proj2.
λλ Fill in examples of Empty-structures. 4 found.
λλ Fill in generalizations of Empty-structures. Failed.
λλ Check examples of List-union.
λλ Check examples of Bags. Defined Singleton-bags.
λλ Fill in examples of Bag-union.
λλ Check examples of Proj2.
λλ Fill in examples of Set-union.
λλ Check examples of Set-union. Define λ (x,y) x∪y=x, later called Superset.
.SUBBAG: BNN;
λλ Fill in examples of Bag-insert.
λλ Check examples of Bag-insert. Range is really Nonempty bags. Isolate the results
of insertion restricted to Singletons: call them Doubleton-bags.
λλ Fill in examples of Bag-intersect.
λλ Fill in examples of Set-insert.
λλ Check examples of Set-insert. Range is always Nonempty sets.
Define λ (x,S) Set-insert(x,S)=S; i.e., set membership.
Define Doubleton sets.
λλ Fill in examples of Bag-delete.
λλ Fill in examples of Bag-difference.
λλ Check examples of Bag-intersect.
Define λ (x,y) x∩y=(); i.e. disjoint bags.
λλ Fill in examples of Set-intersect.
λλ Check examples of Set-intersect.
Define λ (x,y) x∩y=x; i.e., subset.
Also define disjoint sets: λ (x,y) x∩y=α{α⎇.
.SUPERBAG: BNN;
λλ Fill in examples of List-intersect.
λλ Fill in examples of Equal. Very difficult to find examples; this
led to:
λλ Fill in generalizations of Equal. Define
"Same-size", "Equal-CARs", and some losers.
λλ Fill in examples of Same-size.
λλ Apply an Algorithm for Canonize to the args Same-size and Equal.
AM eventually synthesizes the canonizing function "Size". AM defines
the set of canonical structures: bags of T's; this later gets renamed
as "Numbers".
.NUMTASK: BNN;
λλ Restrict the domain/range of Bag-union. A new operation is
defined, Number-union, with domain/range entry <Number Number α→
Bag>.
λλ Fill in examples of Number-union. Many found.
λλ Check the domain/range of Number-union. Range is `Number'.
This operation is renamed "Add2".
λλ Restrict the domain/range of Bag-intersect to Numbers. Renamed
"Minimum".
λλ Restrict the domain/range of Bag-delete to Numbers. Renamed
"SUB1".
λλ Restrict the domain/range of Bag-insert to Numbers.
AM calls the new operation "Number-insert". Its
domain/range entry is <Anything Number α→ Bag>.
λλ Check the domain/range of Number-insert. This doesn't lead
anywhere.
λλ Restrict the domain/range of Bag-difference to Numbers. This
becomes "Subtract".
λλ Fill in examples of Subtract. This leads to defining the relation LEQ
(⊗6≤⊗*).$$
If a
larger number is "subtracted" from a smaller, the result is zero. AM
explicitly defines the set of ordered pairs of numbers having zero
"difference". <x,y> is in that set iff x is less than or equal to y. $
λλ Fill in examples of LEQ. Many found.
λλ Check examples of LEQ.
λλ Apply algorithm of Coalesce to LEQ. LEQ(x,x) is Constant-True.
λλ Fill in examples of Parallel-join2. Included is
Parallel-join2(Bags,Bags,Proj2), which is renamed "TIMES",
and
Parallel-join2(Structures,Structures,Proj1), a generalized
Union operation renamed "G-Union", and a bunch of losers.
.TIMTASK: BNN;
.Z7←BNN+12;
.OMITT: Z7;
λλ-- ⊗6{Z7⎇.⊗* Fill in and check examples of the operations just
created.
.BNN←Z7;
λλ Fill in examples of Coalesce. Created: Self-Compose, Self-Insert,
Self-Delete, Self-Add, Self-Times, Self-Union, etc. Also:
Coa-repeat2, Coa-join2, etc.
λλ Fill in examples of Self-Delete. Many found.
λλ Check examples of Self-Delete. Self-Delete is just Identity-op.
λλ Fill in examples of Self-Member. No positive examples found.
λλ Check examples of Self-Member. Self-member is just Constant-False.
λλ Fill in examples of Self-Add. Many found. User renames this "Doubling".
λλ Check examples of Coalesce. Confirmed.
λλ Check examples of Add2. Confirmed.
λλ Fill in examples of Self-Times.
Many found.
Renamed "Squaring" by the user.
λλ Fill in examples of Self-Compose.
Defined Squaring-o-Squaring.
Created Add-o-Add (two versions: Add21 which is
λ (x,y,z) (x+y)+z, and Add22 which is x+(y+z)). Similarly, two
versions of Times-o-Times and of
Compose-o-Compose.
λλ Fill in examples of Add21. (x+y)+z. Many are found.
λλ Fill in examples of Add22. x+(y+z). Again many are found.
λλ Check examples of Squaring. Confirmed.
λλ Check examples of Add22. Add21 and Add22 appear equivalent. But
first:
λλ Check examples of Add21. Add21 and Add22 still appear equivalent.
Merge them. So the proper argument for a generalized "Add" operation
is a Bag.
λλ Apply algorithm for Invert to argument `Add'. Define Inv-add(x) as
the set of all bags of numbers (>0) whose sum is x. Also denoted Add-1-(x).
λλ Fill in examples of TIMES21. (xy)z. Many are found.
λλ Fill in examples of TIMES22. x(yz). Again many are found.
λλ Check examples of TIMES22. TIMES21 and TIMES22 may be equivalent.
λλ Check examples of TIMES21. TIMES21 and TIMES22 still appear
equivalent. Merge them. So the proper argument for a generalized
"TIMES" operation is a Bag. Set up an analogy between TIMES and ADD,
because of this fact.
λλ Apply algorithm for Invert to argument `TIMES'. Define
Inv-TIMES(x) as the set of all bags of numbers (>1) whose product is x.
Analogic to Inv-Add.
λλ Fill in examples of Parallel-replace2. Included are
Parallel-replace2(Bags,Bags,Proj2) (called MR2-BBP2), and many losers.
.Z7←BNN+16;
λλ-- ⊗6{Z7⎇.⊗* Fill in and check examples of the operations just
created.
.BNN←Z7;
λλ Fill in examples of Compose. So easy that AM creates Int-Compose.
λλ Fill in examples of Int-Compose. The two chosen operations G,H
must be such that ran(H)⊗6ε⊗*dom(G), and ran(G)⊗6ε⊗*dom(H); both G
and H must be interesting.
Create G-Union-o-MR2-BBP2,$$ an alternate
derivation of the operation of multiplication. $
Insert-o-Delete, Times-o-Squaring, etc.
.Z7←BNN+18;
λλ-- ⊗6{Z7⎇.⊗* Fill in and check examples of the compositions just
created. Notice that G-Union-o-MR2-BBP2 is just
TIMES.
.BNN←Z7;
λλ Fill in examples of Coa-repeat2. Among them:
Coa-repeat2(Bags-of-Numbers, Add2) [multiplication again!],
Coa-repeat2(Bags-of-Numbers, Times) [exponentiation],
Coa-repeat2(Structures, Proj1) [CAR],
Coa-repeat2(Structures, Proj2) [Last-element-of], etc.
λλ Check the examples of Coa-repeat2. All confirmed.
λλ Apply algorithms for Invert to `Doubling'.
The result is called "Halving" by the user.
AM then defines "Evens".
λλ Fill in examples of Self-Insert.
λλ Check examples of Self-Insert. Nothing special found.
λλ Fill in examples of Coa-repeat2-Add2.
λλ Check examples of Coa-repeat2-Add2. It's the same as TIMES.
λλ Apply algorithm for Invert to argument `Squaring'. Define
"Square-root".
λλ Fill in examples of Square-root. Some found, but very inefficiently.
λλ Fill in new algorithms for Square-root. Had to ask user for a good one.
λλ Check examples of Square-root. Define the set of numbers
"Perfect-squares".
λλ Fill in examples of Coa-repeat2-Times. This is exponentiation.
λλ Check examples of Coa-repeat2-Times. Nothing special noticed,
unfortunately.
λλ Fill in examples of Inv-TIMES.
Many found, but inefficiently.
λλ Fill in new algorithms for Inv-TIMES. Obtained opaquely from the user.
λλ Check examples of Inv-TIMES. This task suggests the next one:
λλ Compose G-Union with Inv-TIMES. Good domain/range. Renamed
"Divisors".
λλ Fill in examples of Divisors. Many found, but not very efficiently.
λλ Fill in new algorithms for Divisors. Obtained from the user.
λλ Fill in examples of Perfect-squares. Many found.
λλ Fill in specializations of TIMES.
Times1(x)⊗6≡⊗*1⊗8#⊗*x, Times2(x)⊗6≡⊗*2x,
Times-sq is TIMES with its domain restricted to bags of perfect squares, Times-ev
takes only even arguments,
Times-to-evens requires that the result be even, Times-to-sq, ...
λλ Check examples of Divisors.
Define 0-Div, 1-Div, 2-Div, and 3-Div, the sets of numbers
whose Divisors value is the empty set, a singleton, a doubleton, and a tripleton,
respectively.
λλ Fill in examples of 1-Div.
Only one example found: "1".
Lower 1-Div.Worth.
λλ Fill in examples of 0-Div.
None found. Lower the worth of this concept.
λλ Fill in examples of 2-Div.
A nice number are found. Raise 2-Div.Worth.
λλ Check examples of 2-Div. All confirmed,
but no pattern noticed.
λλ Fill in examples of 3-Div. A nice number found.
λλ Check examples of 3-Div. All confirmed. All are perfect squares.
λλ Restrict Square-root to numbers which are in 3-Div. Call this Root3.
λλ Fill in examples of Root3. Many found.
λλ Check examples of Root3. All confirmed. All are in 2-Div.
Raise their worths.
λλ Restrict Squaring to 2-divs. Call the result Square2.
λλ Fill in examples of Square2. Many found.
λλ Check the range of Square2. Always 3-Divs.
Conjecture: x has 2 divisors iff x↑2 has 3 divisors.
λλ Restrict Squaring to 3-Divs. Call the result Square3.
λλ Restrict Square-rooting to 2-Divs. Call the result Root2.
λλ Fill in examples of Square3. Many found.
λλ Compose Divisors-of and Square3. Call the result Div-Sq3.
λλ Fill in examples of Div-Sq3. Many found.
λλ Check examples of Div-Sq3.
All such examples are Same-size.
.Z7←BNN+8;
λλ-- ⊗6{Z7⎇.⊗* More confirmations and explorations of the above conjecture.
Gradually, all its ramifications lead to dead-ends (as far as AM is concerned).
.BNN←Z7;
λλ Fill in examples of Root2. None found.
Conjecture that there are none.
λλ Check examples of Inv-TIMES. Inv-TIMES always contains a singleton bag, and
always contains a bag of primes.
λλ Restrict the range of Inv-TIMES to bags of primes. Call this
Prime-Times.
λλ Restrict the range of Inv-TIMES to singletons. Called
Single-Times.
λλ Fill in examples of Prime-times. Many found.
λλ Check examples of Prime-times. Always
a singleton set.
User renames this conjecture "The unique factorization theorem".
λλ Fill in examples of Single-TIMES. Many found.
λλ Check examples of Single-TIMES. Always
a singleton set.
Single-TIMES is actually the same as Bag-insert!
λλ Fill in examples of Self-set-union. Many found.
λλ Check examples of Self-set-union. This operation is same as
Identity.
λλ Fill in examples of Self-bag-union. Many found.
λλ Check examples of Self-bag-union. Confirmed. Nothing interesting noticed.
λλ Fill in examples of Inv-ADD.
λλ Check examples of Inv-ADD. Hordes of boring conjectures, so:
λλ Restrict the domain of Inv-ADD to primes (Inv-Add-primes), to evens
(Inv-Add-evens), to squares, etc.
λλ Fill in examples of Inv-add-primes. Many found.
λλ Check examples of Inv-add-primes. Confirmed, but nothing special noticed.
λλ Fill in examples of Inv-add-evens. Many found.
λλ Check examples of Inv-add-evens. Always contains
a bag of primes.
λλ Restrict the range of Inv-Add-evens to bags of primes. Called
Prime-ADD.
λλ Restrict the range of Inv-ADD to singletons. Call that new operation
Single-ADD.
λλ Fill in examples of Prime-ADD. Many found.
λλ Check examples of Prime-ADD. Always
a nonempty set (of bags of primes).
User renames this conjecture "Goldbach's conjecture".
λλ Fill in examples of Single-ADD. Many found.
λλ Check examples of Single-ADD. Always
a singleton set. This operation is the same as
Bag-insert and Single-TIMES.
λλ Restrict the range of Prime-ADD to singletons, by analogy to
Prime-TIMES.$$ In this case, AM is asking which numbers are uniquely representable
as the sum of two primes.$ Call the new operation Prime-ADD-SING.
λλ Fill in examples of Prime-ADD-SING. Many found.
λλ Check examples of Prime-ADD-SING. Nothing special noticed.
λλ Fill in examples of Times-sq.$$ Recall that this is just TIMES restricted
to operate on perfect squares. $ Many examples found.
λλ Check domain/range of Times-sq. Is the range actually Perfect-squares?
Yes!
λλ Fill in examples of Times1. Recall that Times1(x)⊗6≡⊗*TIMES(1,x).
λλ Check examples of Times1. Apparently just a restriction of Identity.
λλ Check examples of Times-sq. Confirmed.
λλ Fill in examples of Times0.
λλ Fill in examples of Times2.
λλ Check examples of Times2. Apparently the same as Doubling.
That is, x+x=2⊗8#⊗*x. Very important.
By analogy, define Ad2(x) as x+2.
λλ Fill in examples of Ad2.
λλ Check examples of Ad2. Nothing interesting noticed.
λλ Fill in specializations of Add. Among those created are:
Add0 (x+0), Add1, Add3, ADD-sq (addition restricted to perfect squares),
Add-ev (sum of even numbers), Add-pr (sum of primes), etc.
λλ Check examples of Times0. The value always seems to be 0.
λλ Fill in examples of Times-ev.$$ Recall that Times-ev is just like
TIMES restricted to operating on even numbers. $ Many examples found.
λλ Check examples of Times-ev. Apparently all the results are Evens.
λλ Fill in examples of Times-to-ev.$$ That is, consider bags of numbers
which multiply to give an even number. $ Many found.
λλ Fill in examples of Times-to-sq. Only a few found.
λλ Check examples of Times-to-sq. All arguments always seem to be squares.
Conjec: Times-to-sq is really the same as Times-sq. Merge the two.
This is a false conjecture, but did AM no harm.
λλ Check examples of Times-to-ev. The domain always contains an even number.
λλ Fill in examples of Self-Union.
λλ Check examples of Self-Union.
λλ Fill in examples of SubSet.
λλ Check example of SubSet.
λλ Fill in examples of SuperSet.
.ONCE TURN ON "{⎇";
λλ Check examples of SuperSet.
Conjec: Subset(x,y) iff Superset(y,x). Important.
.SUBTIE: BNN;
λλ Fill in examples of Compose-o-Compose-1. AM creates
some explosive combination (e.g.,
(Compose-o-Compose)-o-(Compose-o-Compose)-o-(Compose-o-Compose)),
some poor ones (e.g., Square-o-Count-o-ADD-1-),
and even a few -- very few -- winners (e.g., SUB1-o-Count-o-Self-Insert).
λλ Check examples of Compose-o-Compose-1.
λλ Fill in examples of Compose-o-Compose-2.$$ Recall that the difference
between this operation and the last one is merely in the order of the composing:
F-o-(G-o-H) versus (F-o-G)-o-H. $
AM recreates many of the previous tasks' operations.
λλ Check examples of Compose-o-Compose-2. Nothing noticed yet$$ Later on,
AM will use these new operations to discover the associativity of Compose. $.
.Z7←BNN+21;
λλ-- ⊗6{Z7⎇.⊗* Fill in and check examples of the losing compositions just
created.
.BNN←Z7;
λλ Fill in examples of Add-sq (i.e., sum of squares).
λλ Check domain/range entries of Add-sq. The range is not always perfect squares.
Define Add-sq-sq(x,y),
which is True iff x and y are perfect squares and their sum is a perfect
square as well.
λλ Fill in examples of Add-pr; i.e., addition of primes.
λλ Check Domain/range entries of Add-pr. AM defines the set of pairs of primes
whose sum is also a prime. This is a bizarre derivation of prime pairs.
.E
. SKIP TO COLUMN 1; SSSEC(Two-Dimensional Behavior Graph)
On the next two pages is a graph of the same "best run" which AM executed.
The nodes are concepts, and the links are actions which AM performed.
Labels on the links indicate when each action was taken, so the reader
may observe how AM jumped around. It should also be
easy to perceive from the graph
which paths of development were abandoned, which concepts ignored, and which ones
concentrated upon. These are precisely the features of AM's behavior which are
awkward to infer from a simple linear trace (as in the previous section).
In more detail, here is how to read the graph:
Each node is a concept.
To save space, these names are often highly abbreviated. For example,
"x0" is used in place of "TIMES-0".
Each concept name is surrounded by from zero to four numbers:
.BEGIN NOFILL; PREFACE 0; INDENT 20; SELECT 6;
318 288
FROBNATION
310 291
.E
The upper right number indicates the task number (see last section) during which
examples of this concept were filled in. The lower right number tells when they
were checked.
The upper left number indicates when the Domain/range facet of that
concept was modified. Finally, the lower left number is the task number during
which some new Algorithms for that concept were obtained.
A number in parentheses indicates that the task with that number was a total
failure.
Because of the limited space, it was decided that if a concept were ever
renamed by the user, then only that newer, mnemonic name would be given in
the diagram. Thus there is an arrow from "Coalesce" to "⊗4Square⊗*", an operation
originally called "Self-Times" by AM.
Sometimes, a concept will have under it a note of the form ⊗6≡GROK⊗1.
This simply means that AM eventually discovered that the concept was
equivalent to the already-known concept "Grok", and probably forgot about
this one (merged it into the one it already knew about).
The "trail" of discovery may pick up again at that pre-existing concept.
A node written as ⊗6=GROK⊗* means that the concept was really the same as
"Grok", but AM never investigated it enough to notice this.
The arrows indicate the creation of new concepts. Thus an arrow leading
to concept "Frobnate" indicates how that concept was created. An arrow directed
away from Frobnate points to a concept created as, e.g., a specialization or an
example of Frobnate. No arrowheads are
in practice necessary:
all arrows are directed ⊗4downwards⊗*.
The arrows may be labelled, indicating precisely what they represent (e.g.,
composition, restriction) and what the task number was when they occurred.
For space reasons, the following convention has proven necessary:
if an arrow emanating from C is un-numbered,
it is assumed to have occurred at the same time
as the arrow to its immediate left which also points from C;
if all the arrows
emanating from C have no number, than all their times of occurrence are
assumed to be the ⊗4lower right⊗*$$
This is often true because many concepts are created
while checking examples of some known concept.
$ number of C.
Finally, if C has no lower right number, the arrow is assumed to have the
value of the upper right number of C.
An unlabelled arrow is assumed to be an act of
Specialization or the creation of
an Example.$$
It should be clear in each context which is happening. If not, refer to the
short trace in the preceding section, and look up the appropriate task number.
$ Labels, when they do occur, are given in capitals and small letters;
concept names (nodes) are by contrast in all capitals.
.ONCE TURN ON "{⎇";
All the numbers correspond to those given to the tasks
in the task-by-task traces presented
in the last section (p. {[3]SHORTASKP⎇).
.if false then start;
and in Appendix {[2]TRACES⎇
(p. {[3]TRAT⎇).
.end;
The first part of this graph (presented below) contains static structural
(and ultimately numerical) concepts which were studied by AM:
.SKIP 19;
The rest of the graph (presented on the next page) deals with activities which were
investigated:
.SKIP TO COLUMN 1;
.PBG: PAGE;
( Paste Concept Development Behavior Graph here. )
. SKIP TO COLUMN 1; SSSEC(AM as a Computer Program)
When viewed as a large LISP program, there is very little of interest about
AM. There are the usual battery of customized functions (e.g., a conditional
PRINT function), the storage hacks (special emergency garbage collection
routines, which know which facets are expendible), the time hacks
(omnisciently arrange clauses in a conjunction so that the one most likely
to fail will come first), and the bugs (if the user renames a concept
while it's the current one being worked on,
there is a 5% chance of AM entering an infinite loop).
Below are listed a few parameters of the system, although I doubt that
they hold any theoretical significance. The reader may be curious about how
big AM, how long it takes to execute, etc.
Machine: SUMEX, PDP-10, KI-10 uniprocessor, 256k core memory.
Language: Interlisp, January '75 release,
which occupies 140k of the total 256k,
but which provides a surplus "shadow space"
of 256k additional words available for holding compiled code.
AM support code:
200 compiled (not block-compiled) utility routines, control routines, etc.
They occupy roughly 100k, but all are pushed into the shadow space.
AM itself: 115 concepts, each occupying about .7k
(about two typed pages, when Pretty-printed with indentation).
Facet/entries stored as
property/value on the property list of atoms whose names are concepts' names.$$
Snazzy feature:
Executable entries on facets (e.g., an entry on Union.Alg) are stored uncompiled
until the first time they are actually called on, at which time they are compiled
and then executed. $
Each concept has about 8 facets filled in.
Heuristics are tacked onto the facets of the concepts. The more general
the concept, the more heuristic rules it has attached to it.$$
This was not done consciously,
and may or may not hold some theoretical significance. $
"Any-concept" has 121 rules; "Active concept" has 24; "Coalesce" has 7;
"Set-Insertion" has none. There are 250 heuristic rules in all, divided into
4 flavors (Fillin, Check, Suggest, Interestingness).
Although the mean number of rules is therefore only about 2.2 (i.e.,
less than 1 of each flavor) per
concept,
the standard deviation of this is a whopping 127.4. The average number of heuristics
(of a given flavor)
encountered rippling upward from a randomly-chosen concept C (along the
network of generalization links) is about 35, even though the mean path length
is only about 4.$$ If the heuristics were homogeneously distributed among the
concepts, the number of heuristics (of a given type) along a typical path
of length 4 would only be about 2, not 35. If all the heuristics were tacked
onto Anything and Any-concept, the number encountered in any path would be
75, not 35. $
The total number of jobs executed in a typical run (from scratch) is about 200.
The run ends because of space problems, but AM's performance begins to degrade
near the end anyway.
"Final" state of AM: 300 concepts, each occupying about 1k. Many are swapped out
onto disk.
Number of winning concepts discovered: 25 (estimated).
Number of acceptable concepts defined: 100 (est.).$$ For a list of most of the
`winners' and `acceptables', see the final section in Appendix {[2]ALLCON⎇,
page {[3]NEWCLIST⎇. $
Number of losing concepts unfortunately worked on: 60 (est.).
The original 115 concepts have grown to an average size of 2k.
Each concept has about 11 facets filled in.
About 30 seconds of cpu time were allocated to each task, on the average,
but the task typically used only about 18 seconds before quitting.
Total CPU time for a run is about 1 hour. Total cpu time consumed by this
research project was about 500 cpu hours.
Real time: about 1 minute per task, 2 hours per run.
The idea for AM was formulated in the Fall of 1974, and AM was coded in the
summer of 1975.
Total time consumed
by this project to date has been about 2600 man-hours: 700 for planning,
500 for coding, 600 for modifying and debugging and experimenting,
700 for writing the thesis, and 100 for editing it into book form.
AM was working by itself:
it received no help from the user, and all its
concepts' Intuitions facets had been removed.
.SKIP 1
.SSEC(Experiments with AM)
.EXPT: SECNUM;
.EXPTSSEC: SSECNUM;
.EXPTPAGE: PAGE;
One valuable aspect of AM is that it is amenable to many kind of
interesting experiments. Although AM is too ⊗4ad hoc⊗* for numerical
results to have much significance, the qualitative results perhaps do
have some valid things to say about research in elementary
mathematics, about automating research, and at least about the
efficacy of various parts of AM's design.
This section will explain what it means to perform an experiment on
AM, what kinds of experiments are imaginable, which of those are
feasible, and finally will describe the many experiments which were
performed on AM.
By modifying AM in various ways, its behavior can be altered, and the
⊗4quality⊗* of its behavior will change as well. As a drastic
example, one experiment involved forcing AM to select the next task to work on
⊗4randomly⊗* from the agenda, not the top task each time. Needless to say,
the performance was very different from usual.
By careful planning, each experiment can tell us something new about
AM: how valuable a certain piece of it is, how robust a certain
scheme really is, etc. The results of these experiments would then
have something to contribute to a discussion of the "real
intelligence" of AM (e.g., what features were superfluous), and
contribute to the design of the "next" AM-like system. Generalizing
from those results, one might suggest some hypotheses about the
larger task of automated math research.
Let's cover the different ⊗4kinds⊗* of experiments one could perform
on AM:
(i) Remove individual concept modules, and/or individual heuristic
rules. Then examine how AM's performance is degraded. AM should
operate even if most of its heuristic rules and most of its concept
modules were excised.
If the remaining fragment of AM is too small, however,
it may not be able to find anything interesting to do. In fact, this
situation was actually encountered experimentally, when the first few
partially complete concepts were inserted. If only a little bit of AM
is removed, the remainder will in fact keep operating without this
"uninteresting collapse". The converse situation should also hold:
although still functional with any concept module unplugged, AM's
performance ⊗4should⊗* be noticeably degraded. That is, while not
indispensable, each concept should nontrivially help the others. The
same holds for each individual heuristic rule.
When a piece of AM is removed,
which concepts does
AM then "miss" discovering? Is the removed concept/heuristic later
discovered anyway by those which are left in AM? This should
indicate the importance of each kind of concept and rule which AM
starts with.
(ii) Vary the relative weights given to features by the criteria
which judge aesthetics, interestingness, worth, utility, etc. See
how important each factor is in directing AM along successful routes.
In other words, vary the little numbers in the formulae (both the
global priority-assigning formula and the local
reason-rating
ones inside heuristic rules). One
important result will be some idea of the robustness or "toughness"
of the numeric weighting factors. If the system easily collapses, it
was too finely tuned to begin with.
(iii) Add several new concept modules (including new heuristics
relevant to them) and
see if AM can work in some unanticipated field of mathematics (like
graph theory or calculus or plane geometry). Do earlier achievements
-- concepts and conjectures AM synthesized already -- have any
impact in the new domain? Are some specialized heuristics from the
first domain totally wrong here? Do all the old general heuristics
still hold here? Are they sufficient, or are some "general"
heuristics needed here which weren't needed before? Does AM "slow
down" as more and more concepts get introduced?
(iv) Try to have AM
develop nonmathematical theories (like elementary physics, or program
verification). This might require limiting AM's freedom to
"ignore a given body of data and move on to something more
interesting". The exploration of very non-formalizable fields (e.g.,
politics) might require much more than a small augmentation of AM's
base of concepts. For some such domains, the "Intuitions" scheme, which had to
be abandoned for math, might prove valid and valuable.
(v) Add several new concepts dealing with proof, and of course add
all the associated heuristic rules. Such rules would advise AM on the
fine points of using various techniques of proof/disproof: when to
use them, what to try next based on why the last attempt failed, etc.
See if the ⊗4kinds⊗* of discoveries AM makes are increased.
Several experiments (of types i, ii, and
iii above)
were
set up and performed on AM. We're now ready to examine each of them
in detail. The following points are covered for each experiment:
.BN
λλ How was it thought of?
λλ What will be gained by it? What would be the implications of the
various possible outcomes?
λλ How was the experiment set up? What preparations/modifications had
to be made? How much time (man-hours) did it take?
λλ What happened? How did AM's behavior change? Was this expected?
Analysis.
λλ What was learned from this experiment? Can we conclude anything
which suggests new experiments (e.g., use a better machine, a new
domain) or which bears on a more general problem that AM faced (e.g.,
a new way to teach math, a new idea about doing math research)?
.E
. SSSEC(Must the Worth numbers be finely tuned?)
.EX3PAGE: PAGE;
Each of the 115 initial concepts
has supplied to it (by the author) a number between 0
and 1000, stored as its Worth facet, which is supposed to represent
the overall value of the concept. "Compose" has a higher initial
Worth than "Structure-delete", which is higher than "Equality"$$ As AM
progresses, it notices something interesting about Equality every now
and then, and pushes its Worth value upwards. $.
Frequently, the priority of a task involving C depends on the overall
Worth of C.
How sensitive is
AM's behavior to the initial settings of the Worth facets? How
finely tuned must these initial Worth values be?
This experiment was thought of because of the `brittleness' of many other
AI systems, the amount of fine tuning needed to elicit coherent behavior.
For example, see the discussion of limitations of PUP6, in [Lenat 75b].
The author believed that AM was very resilient in this regard, and that
a demonstration of that fact would increase credibility in the power of the ideas
which AM embodies.
To test this, a simple experiment was performed. Just before starting
AM, the mean value of all concepts' Worth values was computed. It turned out to be
roughly 200. Then each concept had its Worth reset to the value 200.$$
The initial spread of values had previously been from 100 to 600. $
This
was done "by hand", by the author, in a matter of seconds. AM was
then started and run as if there were nothing amiss, and its behavior
was watched carefully.
What happened? By and large, the same major discoveries were made --
and missed -- as usual, in the same order as usual. But whereas AM
proceeded fairly smoothly before, with little superfluous activity, it
now wandered quite blindly for long periods of time, especially at
the very beginning. Once AM "hooked into" a line of productive
development, it followed it just as always, with no noticeable
additional wanderings. As one of these lines of developments died
out, AM would wander around again, until the next one was begun.
It took roughly three times as long for each major discovery to occur
as normal. This "delay" got shorter and shorter as AM developed
further. In each case, the tasks preceding the discovery and
following it were pretty much the same as normal; only the tasks
"between" two periods of development were different -- and much more
numerous.
.if false then start;
The precise numbers involved would probably be more
misleading than helpful, so they will not be given$$ Any reader who
wishes to perform this experiment can simply say [MAPC CONCEPTS
'(LAMBDA (c) (SETB c WORTH 200] to Interlisp, just before typing
(START) to begin AM. $.
.end;
The reader may be interested to learn that the Worth values of many
of the concepts -- and most of the new concepts -- ended up very
close to the same values that they achieved in the original run.
Overrated concepts were investigated and proved boring; underrated
concepts had to wait longer for their chances, but then quickly proved
interesting and had their Worth facets boosted.
The conclusion I draw from this change in behavior is that the Worth
facets are useful for making blind decisions -- where AM must choose
based only on the overall worths of the various concepts in its
repertoire. Whenever a specific reason existed, it was far more
influential than the "erroneous" Worth values. The close, blind,
random decisions occur between long bursts of specific-reason-driven
periods of creative work.$$ Incidentally, GPS behaved just this same way.
See, e.g., [Newell&Simon 72]. $
The general answer, then, is ⊗4No⊗*, the initial settings of the Worth
values are not crucial. Guessing reasonable initial values for them is
merely a time-saving device.
This suggests
an interesting research problem: what impact does
the ⊗4quality⊗* of initial starting values have on humans? Give several bright
undergraduate math majors the same set of objects and operators to play with,
but tell some of them (i) nothing, and some of them (ii) a certain few pieces of
the system are very promising, (iii)
emphasize a different subset of the objects and operators.
How does "misinformation" impede the humans? How about no information?
Have them give verbal protocols about where they are focussing their attention,
and why.
Albeit at a nontrivial cost, the Worth facets did manage to correct
themselves by the end of a long$$ A couple cpu hours, about a
thousand tasks total selected from the agenda $ run. What would
happen if the Worth facets of those 115 concepts were not only
initialized to 200, but were held fixed at 200 for the duration of
the run?
In this case, the delay still subsided with time. That is, AM
still got more and more "back to normal" as it progressed onward. The reason
is because AM's later work dealt with concepts like Primes,
Square-root, etc., which were so far removed from the initial base of
concepts that the initial concepts' Worths were of little consequence.
Even more drastically, we could force all the Worth facets of all
concepts -- even newly-created ones -- to be kept at the value 200
forever. In this case, AM's behavior doesn't completely disintegrate,
but that delay factor actually increases with time: apparently, AM
begins to suffer from the exponential growth of "things to do" as its
repertoire of concepts grows linearly. Its purposiveness, its
directionality depends on "focus of attention" more and more, and if
that feature is removed, AM loses much of its rationality. A factor
of 5 delay doesn't sound that bad "efficiency-wise", but the actual
apparent behavior of AM is as staccato bursts of development,
followed by wild leaps to unrelated concepts. AM no longer can
"permanently" record its interest in a certain concept.
So we conclude that the Worth facets are (i) not finely tuned, yet
(ii) provide important global information about the relative values
of concepts. If the Worth facets are completely disabled, the
rationality of AM's behavior hangs on the slender thread of "focus of
attention".
. SSSEC(How finely tuned is the Agenda?)
.RTEXSSSEC: SSSECNUM;
.COMMENT This assumes that expt number = sssec number;
.RTEXNO: SSSECNUM;
The top few candidates on the agenda always appear to be reasonable
(to me). If I work with the system, guiding it, I can cause it to
make a few discoveries it wouldn't otherwise make, and I can cause it
to make its typical ones much faster (about a factor of 2). Thus the
⊗4very⊗* top task is not always the best.
If AM randomly selects one of the top 20 or so tasks on the agenda
each time, what will happen to its behavior? Will it disintegrate,
slow down by a factor of 10, slow down slightly,...?
This experiment required only a few seconds to set up, but demanded a
familiarity with the LISP functions which make up AM's control
structure. At a certain point, AM asks for Best-task(Agenda).
Typically, the LISP function Best-task is defined as CAR -- i.e.,
pick the first member from the list of tasks. What I did was to
redefine Best-task as a function which randomly selected n from the
set {1,2,...,20⎇, and then returned the n⊗Ath⊗* member of the
job-list.
If you watch the top job on the agenda, it will take about 10 cycles
until AM chooses it. And yet there are many good, interesting,
worthwhile jobs sprinkled among the top 20 on the agenda, so AM's
performance is cut by merely a factor of 3, as far as cpu time per
given major discovery. Part of this better-than-20 behavior is due
to the fact that the 18⊗Ath⊗* best task had a much lower priority
rating than the top few, hence was allocated much less cpu time for
its quantum than the top task would have received. Whether it
succeeded or failed, it used up very little time. Since AM was
frequently working on a low-value task, it was unwilling to spend
much time or space on it. So the mean time allotted per task fell to
about 15 seconds (from the typical 30 secs). Thus, the "losers" were
dealt with quickly, so the detriment to cpu-time performance was
softened.
.if false then start;
Yet AM is much less rational in its sequencing of tasks. A topic will
be dropped right in the middle, for a dozen cycles, then picked up
again. Often a "good" task will be chosen, having reasons all of
which were true 10 cycles ago -- and which are clearly superior to
those of the last 10 tasks. This is what is so annoying to human
onlookers.
.end;
To carry this investigation further, another experiment was carried
out. AM was forced to alternate between choosing the top task on the
agenda, and a randomly-chosen one. Although its rate of discovery
was cut by less than half, its behavior was almost as distasteful to
the user as in the last (always-random) experiment.
.ALTEXP: PAGE;
⊗2↓_Conclusion_↓⊗*: Picking (on the average) the 10th-best candidate
impedes progress by a factor less than 10 (about a factor of 3), but
it dramatically degrades the "sensibleness" of AM's behavior, the
continuity of its actions. Humans place a big value on absolute
sensibleness, and believe that doing something silly 50% of the time
is ⊗4↓_much_↓⊗* worse than half as productive as always doing the
next most logical task.
Corollary: Having 20 multi-processors simultaneously execute the top
20 jobs will increase the rate of "big" discoveries,
but not by a
full factor of 20 -- more like a factor of 3.
Another experiment in this same vein was done, one which was designed
to be far more crippling to AM. Be-threshhold was held at 0 always,
so ⊗4any⊗* task which ever got proposed was kept forever on the
agenda, no matter how low its priority. The Best-task function was
modified so it randomly selected any member of the list of jobs. As a
final insult, the Worth facets of all the concepts were initialized
to 200 before starting AM.
Result: Many "explosive" tasks were chosen, and the number of new
concepts increased rapidly. As expected, most of these were real
"losers". There seemed no rationality to AM's sequence of actions,
and it was quite boring to watch it floundering so. The typical
length of the agenda was about 500, and AM's performance was "slowed"
by at least a couple orders of magnitude. A more subjective measure
of its "intelligence" would say that it totally collapsed under this
random scheme.
↓_⊗2Conclusion:⊗*_↓ Having an unlimited number of processors
simultaneously execute all the jobs on the agenda would increase the
rate at which AM made big discoveries, at an ever accelerating pace
(since the length of the agenda would grow exponentially).
Having a uniprocessor ⊗4simulate⊗* such parallel processing would be
a losing idea, however. The truly "intelligent" behavior AM exhibits
is its plausible sequencing of tasks.
. SSSEC(How valuable is tacking reasons onto each taskα?)
Let's dig inside the agenda scheme now. One idea I've repeatedly emphasized
is the attaching of reasons to the tasks on the agenda,
and using those reasons and their ratings to compute the overall
priority value assigned to each task. An experiment was done to
ascertain the amount of intelligence that was emanating from that
idea.
The global formula assigning a priority value to each job was
modified. We let it still be a function of the reasons for the job,
but we "trivialized" it: the priority of a job was computed as simply
the number of reasons it has (normalized by multiplying by 100, and
cut-off if over 1000).
This raised the new question of what to do if several jobs all have
the same priority. In that case, I had AM execute them in
stack-order (most recent first)$$ Why? Because (i) it sounds right
intuitively to me, (ii) this is akin to human focus of attention, and
mainly because (iii) this is what AM did anyway, with no extra
modification. $.
Result: I secretly expected that this wouldn't make too much
difference on AM's apparent level of directionality, but such was
definitely not the case. While AM opened by doing tasks which were
far more interesting and daring than usual (e.g., filling in various
Coalescings right away), it soon became obvious that AM was being
swayed by hitherto trivial coding decisions. Whole classes of tasks
-- like Checking Examples of C -- were never chosen, because they
only had one or two reasons supporting them. Previously, one or two
good reasons were sufficient. Now, tasks with several poor reasons
were rising to the top and being worked on. Even the LIFO (stack)
policy for resolving ties didn't keep AM's attention focussed.
⊗2↓_Conclusion:_↓⊗* Unless a conscious effort is made to ensure that
each reason really will carry roughly an equal amount of semantic
impact (charge, weight), it is not acceptable merely to choose tasks
on the basis of how many reasons they possess. Even in those
constricted equal-weight cases, the similarities between reasons
supporting a task should be taken into account.
.if false then start;
Another experiment, not yet performed, will pin down the value of this
rule-attaching idea even more precisely. A threshhold value -- say 400 --
will be fixed. Any reason whose rating is above that threshhold will be
called a ⊗4good⊗* reason, and every other reason will be a minor reason.
Then tasks will be ordered by the number of good reasons they possess,
and ties will be broken by the number of minor reasons.
Still another experiment would be to randomly pick any task with at least
one good reason.
.end;
. SSSEC(What if certain concepts are eliminated/added?)
Feeling in a perverse mood one day, I eliminated the concept
"Equality" from AM, to see what it would then do. Equality was a key
concept, because AM discovered Numbers via the technique of
generalizing the relation "Equality" (exact equality of 2 given
structures, at all internal levels). What would happen if we
eliminate this path? Will AM rederive Equality? Will it get to
Cardinality via another route? Will it do some set-theoretic things?
Result: Rather disappointing. AM never did re-derive Equality, nor
Cardinality. It spent its time thrashing about with various flavors
of data-structures (unordered vs. ordered, multiple-elements allowed
or not, etc.), deriving large quantities of boring results about
them. Very many composings and coalescings were done, but no exciting
new operations were produced.
.if false then start;
It is expected that eliminating other, less central concepts than
Equality will do less damage to AM's progress. The reader is invited
to try such experiments himself.
To eliminate a concept, like equality, one need merely type
⊗4KILB(OBJ-EQUALITY$$ To find out the precise PNAME of each concept,
just type ⊗4CONCEPTS⊗*. $)⊗1 at the beginning of the session, before
typing ⊗4(START)⊗*.
.end;
A kinder type of experiment would be to ⊗4add⊗* a few concepts.
One such experiment was done: the addition of Cartesian-product.
This operation, named C-PROD, accepts two sets as arguments and
returns a third set as its value: the Cartesian product of the first
two.
Result: The only significant change in AM's behavior was that TIMES
was discovered first as the restriction of C-PROD to Canonical-Bags.
When it soon was rediscovered in a few other guises, its Worth was
even higher than usual. AM spent even more time exploring concepts
concerned with it, and deviated much less for quite a long time.
Synthesis of the above experiments: It appears that AM may really be
more specialized than expected; AM may only be able to forge ahead along one or two
main lines of development -- at least if we demand it make very
interesting, well-known discoveries quite frequently. Removing
certain key concepts can be disastrous. On the other hand, adding
some carefully-chosen new ones can greatly enhance AM's
directionality (hence its apparent intelligence).
Conclusion: In its current state,
AM is thus seen to be ⊗4minimally competent⊗*: if any
knowledge is removed, it appears much less intelligent; if any is added, it
appears slightly smarter.
Suggestion for future research:
A hypothesis, which should be tested experimentally, is that the importance
of the presence of each individual
concept decreases as the number of -- and ⊗4depth⊗* of -- the
synthesized concepts increase.
That is, any excision would eventually "heal over", given enough time.
The failure of AM to verify this may be due to the relatively small amount of
development in toto (an hour of cpu time, a couple hundred new concepts, a few
levels deeper than the starting ones).
.if false then start;
. SSSEC(What if certain heuristics are tampered with?)
The class of experiments
described by this section's heading
should prove entertaining, but it will
probably be difficult to learn from their results.
Why is this? Some of the heuristics were added to correct a specific
problem; removing them would simply re-initiate that problem. Others
were never actually used by AM, so their deletion would have no
effect. If AM enlarged the range of what it worked on, their absence
might then be felt.
What good would these experiments be, then? We might learn something
about the "redundancy of reasoning chains". We'd stop AM just before
it made a big discovery, remove the heuristic rule it was about to
use, and see if it ever makes that big discovery anyway, later on. If
not, perhaps the discarded rule was very important, or there are
alternate rules which exist but haven't been inserted in AM. If the
same discovery is made by an alternate route, does that indicate an
unexpected duplication of heuristic knowledge? If heuristic H2 is
used now, instead of H1, does that suggest a new meta-rule: "if you
want to apply one of H1/H2 but can't, see if the other rule ⊗4can⊗* be
applied."? Is that last sentence really a Meta-meta-rule?
Before this discussion enters an infinite loop, I'd better extract
myself -- and the reader -- by commenting that there may be an idea
in all this, perhaps of use to whoever writes Meta-AM. It was
decided not to carry out a systematic series of experiments of this
type until AM is much further developed in abilities.
.end;
.if false then start; SKIP TO COLUMN 1; end;
. SSSEC(Can AM work in a new domainα: Plane Geometry?)
.QQ
A true strategy should be domain-independent.
-- Adams
.ESS
As McDermott points out [McDermott 76], just labelling a
bunch of heuristics `Operation heuristics' doesn't suddenly make them
relevant to any operation; all it does is give that impression to a
human who looks at the code (or a description of it). Since the
author hoped that the labelling really was fair, an experiment was
done to test this. Such an experiment would be a key determiner of
how general AM is.
How might one demonstrate that the "Operation" heuristics really
could be useful or dealing with any operation, not just the ones
already in AM's initial base of concepts?
One way would be to pick a new domain, and see how many old
heuristics contribute to -- and how many new heuristics have to be
added to elicit -- some sophisticated behavior in that domain. Of
course, some new primitive concepts would have to be introduced
(defined) to AM.
.GEOEX: PAGE;
Only one experiment of this type was attempted. The author added a
new base of concepts to the ones already in AM. Included were:
Point, Line, Angle, Triangle, Equality of points/lines/angles/triangles.
These simple plane geometry notions were sufficiently removed from
set-theoretic ones that those pre-existing specific concepts would be
totally irrelevant; on the other hand, the general concepts -- the
ones with the heuristics attached -- would still be just as relevant:
Any-concept, Operation, Predicate, Structure, etc.
For each new geometric concept, the only facet filled in was its
Definition. For the new predicates and operators, their Domain/range
entries were also supplied.
No new heuristics were added to AM.
Results: fairly good behavior. AM was able to find examples of all
the concepts defined, and to use the character of the results of
those examples searches to determine intelligent courses of action.
AM derived congruence and similarity of triangles, and several other
well-known simple concepts. An unusual result was the repeated
derivation of the idea of "timberline". This is a predicate on two
triangles: Timberline(T1,T2) iff T1 and T2 have a common angle, and
the side opposite that angle in the two triangles are parallel:
.B0
⊗1 ⊗* A
/\
/ \
/ \
/ \ ⊗3Timberline(ABC,ADE)⊗*
/ \
B /∂∂∂∂∂∂∂∂∂∂\ C
/ \
/ \
/ \
D∂∂∂∂∂∂∂∂∂∂∂∂∂∂∂∂∂∂E
.E
Since AM kept rederiving this in new ways, it seems surprising that
there is no very common name for the concept. It could be that AM is
using techniques which humans don't -- at least, for geometry.
The only new bit of knowledge that came out of this experiment was a
"use" for Goldbach's conjecture: any angle (0-180 degrees) can be
built up (to within 1 degree) as the sum of two angles of prime
degrees (<180). This result is admittedly esoteric at best, but is
nonetheless worth reporting.
The total effort expended on this experiment was: a few months of
subconscious processing, ten hours of designing the base of concepts
to insert, ten hours inserting and debugging them. The whole task
took about two days of real time.
The conclusion to be drawn is that heuristics really can be generally
useful; their attachment to general-sounding concepts is not an
illusion.$$
Or: it's a very good illusion! But note: if this phenomenon is repeatable
and useful, then (like Newtonian mechanics) it won't pragmatically
matter whether it's only
an illusion.
$ The implication of this is that AM can be grown
incrementally, domain by domain. Adding expertise in a new domain
requires only the introduction of concepts local to that domain; all
the very general concepts -- and their heuristics -- already exist
and can be used with no change.
The author feels that this result can be generalized: AM can be
expanded in scope, even to non-mathematical fields of endeavor. In
each field, however, the rankings of the various heuristics$$ the
numeric values that
should be returned by
the local ratings formulae which are attached to the
heuristic rules. $ may shift slightly. As the domain
gets further away from mathematics, various heuristics are important
which were ignorable before (e.g., those dealing with ethics), and
some pure math research-oriented heuristics become less applicable
("giving up and moving on to another topic" is not an acceptable
response to the 15-puzzle, nor to a hostage situation).
Well, it sounds as if we've shifted our orientation from `Results' to
a subjective evaluation of those results. Let's start a new chapter
to legitimize this type of commentary.